{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Wide and Deep\n", "\n", "```{note}\n", "Deep部分同Embedding+MLP,Wide部分负责记忆\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 结构\n", "\n", "\n", "\n", "左侧是wide部分,右侧是deep部分。\n", "\n", "wide部分:直接把输入层连接到输出层,作用是让模型有较强的记忆力。\n", "\n", "deep部分:典型的embedding + mlp结构,作用是让模型有较强的泛化能力。\n", "\n", "所谓“记忆能力”,即模型直接学习物品或特征的“共现频率”,并把他们直接作为推荐依据。比如说喜欢A电影的也喜欢B这个规则。\n", "\n", "这类规则有两个特点:1.数量非常多;2.非常具体,没必要和其他特征交叉。\n", "\n", "这样我们的Wide&Deep模型就能同时拥有记忆力和泛化能力。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 数据预处理" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf\n", "from tensorflow import keras\n", "import rec\n", "\n", "# 读取movielens数据集\n", "train_dataset, test_dataset = rec.load_movielens()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | movieId | \n", "userId | \n", "rating | \n", "timestamp | \n", "label | \n", "releaseYear | \n", "movieGenre1 | \n", "movieGenre2 | \n", "movieGenre3 | \n", "movieRatingCount | \n", "... | \n", "userRatingCount | \n", "userAvgReleaseYear | \n", "userReleaseYearStddev | \n", "userAvgRating | \n", "userRatingStddev | \n", "userGenre1 | \n", "userGenre2 | \n", "userGenre3 | \n", "userGenre4 | \n", "userGenre5 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "15555 | \n", "3.0 | \n", "900953740 | \n", "0 | \n", "1995 | \n", "Adventure | \n", "Animation | \n", "Children | \n", "10759 | \n", "... | \n", "92 | \n", "1992 | \n", "8.98 | \n", "3.86 | \n", "0.74 | \n", "Drama | \n", "Comedy | \n", "Thriller | \n", "Action | \n", "Crime | \n", "
1 | \n", "1 | \n", "25912 | \n", "3.5 | \n", "1111631768 | \n", "1 | \n", "1995 | \n", "Adventure | \n", "Animation | \n", "Children | \n", "10759 | \n", "... | \n", "21 | \n", "1988 | \n", "14.09 | \n", "3.48 | \n", "1.28 | \n", "Action | \n", "Comedy | \n", "Romance | \n", "Adventure | \n", "Thriller | \n", "
2 | \n", "1 | \n", "29912 | \n", "3.0 | \n", "866820360 | \n", "0 | \n", "1995 | \n", "Adventure | \n", "Animation | \n", "Children | \n", "10759 | \n", "... | \n", "4 | \n", "1995 | \n", "0.50 | \n", "3.00 | \n", "0.00 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
3 | \n", "10 | \n", "17686 | \n", "0.5 | \n", "1195555011 | \n", "0 | \n", "1995 | \n", "Action | \n", "Adventure | \n", "Thriller | \n", "6330 | \n", "... | \n", "35 | \n", "1992 | \n", "8.35 | \n", "2.97 | \n", "1.48 | \n", "Comedy | \n", "Drama | \n", "Adventure | \n", "Action | \n", "Thriller | \n", "
4 | \n", "104 | \n", "20158 | \n", "4.0 | \n", "1155357691 | \n", "1 | \n", "1996 | \n", "Comedy | \n", "NaN | \n", "NaN | \n", "3954 | \n", "... | \n", "81 | \n", "1991 | \n", "8.70 | \n", "3.60 | \n", "0.72 | \n", "Thriller | \n", "Drama | \n", "Action | \n", "Crime | \n", "Adventure | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
88822 | \n", "968 | \n", "26865 | \n", "3.0 | \n", "854092232 | \n", "0 | \n", "1968 | \n", "Horror | \n", "Sci-Fi | \n", "Thriller | \n", "1824 | \n", "... | \n", "94 | \n", "1991 | \n", "12.23 | \n", "3.35 | \n", "0.85 | \n", "Drama | \n", "Thriller | \n", "Comedy | \n", "Crime | \n", "Romance | \n", "
88823 | \n", "968 | \n", "8507 | \n", "2.0 | \n", "974709061 | \n", "0 | \n", "1968 | \n", "Horror | \n", "Sci-Fi | \n", "Thriller | \n", "1824 | \n", "... | \n", "5 | \n", "1994 | \n", "0.89 | \n", "2.00 | \n", "1.00 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
88824 | \n", "969 | \n", "16689 | \n", "5.0 | \n", "857854044 | \n", "1 | \n", "1951 | \n", "Adventure | \n", "Comedy | \n", "Romance | \n", "2380 | \n", "... | \n", "97 | \n", "1992 | \n", "9.95 | \n", "3.53 | \n", "0.82 | \n", "Drama | \n", "Comedy | \n", "Crime | \n", "Romance | \n", "Thriller | \n", "
88825 | \n", "969 | \n", "26460 | \n", "2.0 | \n", "1250279576 | \n", "0 | \n", "1951 | \n", "Adventure | \n", "Comedy | \n", "Romance | \n", "2380 | \n", "... | \n", "55 | \n", "1990 | \n", "11.78 | \n", "2.73 | \n", "1.42 | \n", "Thriller | \n", "Crime | \n", "Drama | \n", "Comedy | \n", "Sci-Fi | \n", "
88826 | \n", "970 | \n", "3033 | \n", "2.0 | \n", "1272394603 | \n", "0 | \n", "1953 | \n", "Adventure | \n", "Comedy | \n", "Crime | \n", "98 | \n", "... | \n", "100 | \n", "1985 | \n", "17.64 | \n", "3.67 | \n", "0.89 | \n", "Drama | \n", "Romance | \n", "Comedy | \n", "Thriller | \n", "Crime | \n", "
88827 rows × 27 columns
\n", "